City College of San Francisco
MATH 108 - Foundations of Data Science
Lecture 02: Cause and Effect¶
Associated Textbook Sections: 2.0, 2.1, 2.2, 2.3. 2.4, 2.5
Overview¶
Associations¶
Regularly Eating Chocolate Is Linked to 8 Percent Lower Heart Attack Risk¶
Image and Headline Source: everydayhealth.com
Study Source: European Journal of Preventive Cardiology
Study Observations¶
- Individuals (study subjects, participants, units, etc.)
- 336,289 US, Swedish, and Australian adults in several studies.
- Treatment
- Chocolate consumption
- Outcome
- Coronary heart disease risk
An Initial Question¶
Is there an association between chocolate consumption and heart disease risk?
An Answer¶
Yes, the reviewed article in the European Journal of Preventive Cardiology concludes that those consumed chocolate more than 1 time per week or more than 3.5 times per month were associated with fewer cases of heart disease compared with those that didn't.
A Follow Up Question¶
Does chocolate consumption lead to a reduction in heart disease? This question is often harder to answer.
An Answer¶
No, there are several factors that could explain why fewer people that consumed chocolate regularly developed heart disease. For example, better health care access could explain financial freedom to consume more foods like chocolate and explain less cases of heart disease.
“Dr. Alice Lichtenstein, an American Heart Association volunteer and professor of nutrition science and policy at Tufts University, was more skeptical of the findings.”
A Data Science Origin Story¶
London, Early 1850’s¶
Image Source: Wikipedia - 1954 Broad Street Cholera Outbreak
Miasmas, Miasmatism, Miasmatists¶
- Bad smells given off by waste and rotting matter
- Believed to be the main source of disease
- Staunch believers:
- Florence Nightingale (founder of modern nursing)
- Edwin Chadwick (Commissioner of the General Board of Health)
Suggested Remedies¶
Cholera, around 1850¶
- “fly to clene air”
- “a pocket full o’posies”
- “fire off barrels of gunpowder”
This might seem strange ...
COVID-19, 2020¶
- Inject disinfectant
- Sunlight
- Hydroxychloroquine
- Take 6 deep breaths, then cough while covering mouth
- Cannabis, cocaine, mangoes, onion, garlic, drinking water every 15 minutes, tea, eating ice cream, avoiding ice cream
John Snow, 1813-1858¶
Cholera Map¶
Image and Text Source: National Geographic - Mapping A London Epidemic
According to the National Geographic Society,
"This map of London was created by John Snow in 1854. London was experiencing a deadly cholera epidemic, when Snow tracked the cases on this map. The cholera cases are highlighted in black. Using this map, Snow and other scientists were able to trace the cholera outbreak to a single infected water pump."
from IPython.display import IFrame
IFrame(src="https://www.google.com/maps/embed?pb=!1m18!1m12!1m3!1d2482.9971371478814!2d-\
0.13879218398430104!3d51.51326851809472!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13\
.1!3m3!1m2!1s0x487604d4eb49ec6d%3A0xc4ff84518f83499d!2sJohn%20Snow!5e0!3m2!1\
sen!2sus!4v1642117611191!5m2!1sen!2sus",
width=800, height=600)
Causation¶
London Water Supply Service Regions¶
Image Source: British Library - John Snow's map showing the water supply in London, 1855
Image NOTE:
- Blue - Southwark and Vauxhall Company
- Red - Lambeth Company
- Purple - The area in which the pipes of both Companies are intermingled.
Comparison¶
- Treatment group
- Control group
- Does not receive the treatment
Snow’s “Grand Experiment” ... Study¶
“… there is no difference whatever in the houses or the people receiving the supply of the two Water Companies, or in any of the physical conditions with which they are surrounded …”
The two groups were similar except for the treatment.
Snow's Table¶
Python Imports and Settings¶
from datascience import *
import numpy as np
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
%matplotlib inline
snows_table = Table(['Supply Area', 'Number of Houses', 'Cholera Deaths']).with_rows([
['S&V', 40046, 1263],
['Lambeth', 26107, 98],
['Rest of London', 256423, 1422]
])
snows_table
| Supply Area | Number of Houses | Cholera Deaths |
|---|---|---|
| S&V | 40046 | 1263 |
| Lambeth | 26107 | 98 |
| Rest of London | 256423 | 1422 |
To compare the deaths totals in various supply areas, calculate the relative frequency of deaths per household.
death_per_house = snows_table.column('Cholera Deaths') / snows_table.column('Number of Houses')
snows_table.with_column('Deaths per House',
death_per_house)
| Supply Area | Number of Houses | Cholera Deaths | Deaths per House |
|---|---|---|---|
| S&V | 40046 | 1263 | 0.0315387 |
| Lambeth | 26107 | 98 | 0.00375378 |
| Rest of London | 256423 | 1422 | 0.00554552 |
Scale and round the rates to show whole numbers.
deaths_per_10000_houses = snows_table.column('Cholera Deaths') / snows_table.column('Number of Houses') * 10000
snows_table.with_column('Deaths per 10,000 Houses',
np.round(deaths_per_10000_houses))
| Supply Area | Number of Houses | Cholera Deaths | Deaths per 10,000 Houses |
|---|---|---|---|
| S&V | 40046 | 1263 | 315 |
| Lambeth | 26107 | 98 | 38 |
| Rest of London | 256423 | 1422 | 55 |
Scaling rates a common presentation technique. This can provide clarity, but it can also be misleading!
A Key to Establishing Causality¶
If the treatment and control groups are similar apart from the treatment, then differences between the outcomes in the two groups can be ascribed to the treatment.
Confounding Variables¶
Confounding Factors Weaken a Causal Argument¶
If the treatment and control groups have systematic differences other than the treatment, then it might be difficult to identify causality.
Such differences are often present in observational studies.
When they lead researchers astray, they are called confounding factors.
Example of a Confounding Relationship¶
Randomize! to Strengthen a Causal Argument¶
- If you assign individuals to treatment and control at random, then the two groups are likely to be similar apart from the treatment.
- You can (mathematically) account for variability in the assignment.
- Randomized Controlled Experiment:
- Randomly assign individuals to treatments
- Ensure one treatment is a control where there outcome is understood.
Be Careful ...¶
Regardless of what the dictionary says, in probability theory